Skip to content

THRIFT-5587: Add UUID support for PHP#3332

Open
sveneld wants to merge 2 commits intoapache:masterfrom
sveneld:php-uuid-support
Open

THRIFT-5587: Add UUID support for PHP#3332
sveneld wants to merge 2 commits intoapache:masterfrom
sveneld:php-uuid-support

Conversation

@sveneld
Copy link
Contributor

@sveneld sveneld commented Mar 7, 2026

Summary

Implements UUID as a first-class type in the PHP library and code generator, as part of THRIFT-5587.

  • UUID is represented as a canonical string (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) in PHP, similar to the Python implementation
  • Wire format: fixed 16 bytes in big-endian order (binary and compact protocols)
  • JSON protocol: UUID serialized as a JSON string in canonical form

Changes

PHP Library:

  • TType::UUID = 16 constant
  • writeUuid()/readUuid() in TBinaryProtocol, TCompactProtocol, TJSONProtocol, TSimpleJSONProtocol, TProtocolDecorator
  • UUID skip support in TProtocol (skip() and skipBinary())
  • Compact protocol type mapping (COMPACT_UUID = 0x0D)
  • TBase $tmethod mapping for automatic read/write dispatch

Code Generator (t_php_generator.cc):

  • TYPE_UUID cases in all switch statements: type_to_enum, type_to_cast, type_to_phpdoc, render_const_value, generate_serialize_field (protocol + binary inline), generate_deserialize_field (protocol + binary inline)

Tests:

  • Unit tests for UUID read/write in TBinaryProtocol and TCompactProtocol
  • Cross-test handler (testUuid) and client test cases
  • Updated test Makefiles and ThriftTest.thrift references to use the current version (with UUID fields)

Test plan

  • PHP unit tests pass (196/196 protocol tests, 0 failures)
  • Thrift compiler builds and generates correct PHP UUID code (TType::UUID, readUuid, writeUuid)
  • Generated ThriftTest_testUuid_args.php and ThriftTest_testUuid_result.php contain correct UUID serialization
  • Verified on ubuntu-noble (PHP 8.3) Docker environment

Implement UUID as a first-class type in the PHP library and code generator.
UUID (wire type 16, compact protocol type 0x0D) is represented as a string
in canonical form (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx), consistent with
the Python approach. Fixed 16 bytes on wire in binary/compact protocols,
JSON string in JSON protocol.

Changes:
- TType: add UUID = 16 constant
- TProtocol: add abstract writeUuid/readUuid, UUID cases in skip/skipBinary
- TBinaryProtocol: implement writeUuid/readUuid (16 raw bytes)
- TCompactProtocol: add COMPACT_UUID = 0x0D, type mappings, writeUuid/readUuid
- TJSONProtocol: add NAME_UUID = "uid", type mappings, writeUuid/readUuid
- TSimpleJSONProtocol: add writeUuid/readUuid
- TProtocolDecorator: add writeUuid/readUuid delegation
- TBase: add UUID to $tmethod dispatch array
- t_php_generator.cc: add TYPE_UUID in all code generation switch statements
- Update test configs to use current ThriftTest.thrift (with UUID fields)
- Add UUID unit tests for TBinaryProtocol and TCompactProtocol

Client: php
Patch: Volodymyr Panivko
@CJCombrink
Copy link
Contributor

Nice work!!

Can you just ensure that the "on the wire" format is the same between your implementation and .net or c++ or one of those languages: Visual confirmation is good enough.

For reference refer to this comment I made and my observations on inconsistencies in #3144

@sveneld
Copy link
Contributor Author

sveneld commented Mar 9, 2026

Thanks for the review! Here's the visual confirmation of wire format compatibility between PHP and C++.

Test setup

Both tests were run inside an ubuntu-noble Docker container with the thrift compiler and C++ library built from this branch.

PHP test — writes UUID via TBinaryProtocol::writeUuid() and TCompactProtocol::writeUuid(), prints raw wire bytes:

$uuid = '550e8400-e29b-41d4-a716-446655440000';

$transport = new TMemoryBuffer();
$protocol = new TBinaryProtocol($transport);
$protocol->writeUuid($uuid);
$data = $transport->read(1024);
echo "PHP Binary:  " . bin2hex($data) . " (" . strlen($data) . " bytes)\n";

$transport2 = new TMemoryBuffer();
$protocol2 = new TCompactProtocol($transport2);
$protocol2->writeUuid($uuid);
$data2 = $transport2->read(1024);
echo "PHP Compact: " . bin2hex($data2) . " (" . strlen($data2) . " bytes)\n";

C++ test — writes the same UUID via TBinaryProtocol::writeUUID() and TCompactProtocol::writeUUID():

TUuid uuid("550e8400-e29b-41d4-a716-446655440000");

auto buf = std::make_shared<TMemoryBuffer>();
TBinaryProtocol proto(buf);
proto.writeUUID(uuid);
std::string data = buf->getBufferAsString();
printf("C++ Binary:  ");
for (unsigned char c : data) printf("%02x", c);
printf(" (%zu bytes)\n", data.size());

auto buf2 = std::make_shared<TMemoryBuffer>();
TCompactProtocol proto2(buf2);
proto2.writeUUID(uuid);
std::string data2 = buf2->getBufferAsString();
printf("C++ Compact: ");
for (unsigned char c : data2) printf("%02x", c);
printf(" (%zu bytes)\n", data2.size());

Output

PHP Binary:  550e8400e29b41d4a716446655440000 (16 bytes)
PHP Compact: 550e8400e29b41d4a716446655440000 (16 bytes)
C++ Binary:  550e8400e29b41d4a716446655440000 (16 bytes)
C++ Compact: 550e8400e29b41d4a716446655440000 (16 bytes)

Wire bytes are identical across both implementations and both protocols. UUID hex digits are written directly as 16 bytes in order — no byte swapping.

Validate UUID format on write (all protocols) and on read (JSON protocol)
using the canonical regex pattern. Throws TProtocolException on invalid input.
@sveneld
Copy link
Contributor Author

sveneld commented Mar 9, 2026

Also added UUID format validation in a follow-up commit. The writeUuid() method now validates input across all protocols (Binary, Compact, JSON), and readUuid() validates in JSON protocol (where UUID arrives as a string). Invalid UUIDs throw TProtocolException::INVALID_DATA.

This matches the approach used in the Ruby implementation (UUID.validate_uuid!).

@CJCombrink
Copy link
Contributor

CJCombrink commented Mar 9, 2026

Also added UUID format validation in a follow-up commit. The writeUuid() method now validates input across all protocols (Binary, Compact, JSON), and readUuid() validates in JSON protocol (where UUID arrives as a string). Invalid UUIDs throw TProtocolException::INVALID_DATA.

I have been thinking about this since on the NodeJS side it does validation when reading values. However I don't think that validation is something that should be happen for every field for every request, I think it adds unnecessary overhead for an application that is correct (that probably does validation outside of the message exchange).

Perhaps useful for debugging but not in operation.

@Jens-G What is your feeling around this?

Edit; see this commit I had to do on the nodets side CJCombrink@8b5576b because of the incompatibility with JAVA the node side throws away the messages since the UUID is not valid (due to the wrong byte order on the wire), this made me question 'in line' validation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants